Skip to content

Conversation

@FabianHofmann
Copy link
Collaborator

@FabianHofmann FabianHofmann commented Sep 8, 2025

  • Remove pandas-based LP writing functions and replace with polars versions
  • Rename polars functions to remove '_polars' suffix for consistent API
  • Create separate get_printers_scalar() for non-LP functions (highspy, gurobi, mosek)
  • Update get_printers() to handle polars dataframes for LP writing
  • Consolidate "lp" and "lp-polars" io_api options to use same implementation
  • Remove unused imports and cleanup handle_batch function

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

- Remove pandas-based LP writing functions and replace with polars versions
- Rename polars functions to remove '_polars' suffix for consistent API
- Create separate get_printers_scalar() for non-LP functions (highspy, gurobi, mosek)
- Update get_printers() to handle polars dataframes for LP writing
- Consolidate "lp" and "lp-polars" io_api options to use same implementation
- Remove unused imports and cleanup handle_batch function
@FabianHofmann
Copy link
Collaborator Author

FabianHofmann commented Sep 8, 2025

the replacement comes with a small trade off for smaller problems where the pandas based IO is still a bit faster. however for larger problems polars significantly speeds up. any opinios @coroa @lkstrp ?

(note that the line of the crossover point in the upper right is a bit off, but you get the message)
image

@FabianHofmann
Copy link
Collaborator Author

also tagging @fneum

@FabianHofmann
Copy link
Collaborator Author

thinking about it I would say we go for it. we are talking about maximally 7 ms slower in bad configurations (tiny problems) but minutes faster for large problems, and the code get streamlined.

Copy link
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it I would say we go for it. we are talking about maximally 7 ms slower in bad configurations (tiny problems) but minutes faster for large problems, and the code get streamlined.

Agreed! Also the cleanup is nice

The polars migration broke NaN validation because check_has_nulls_polars
only checked for null values, not NaN values. In polars, these are distinct
concepts. This fix enhances the validation to detect both null and NaN values
in numeric columns while avoiding type errors on non-numeric columns.

Fixes failing tests in test_inconsistency_checks.py that expected ValueError
to be raised when variables have NaN bounds.
@fneum
Copy link
Member

fneum commented Sep 8, 2025

Agree as well, but we should do a memory check comparison as well (e.g. with a PyPSA-EUR case).

@FabianHofmann
Copy link
Collaborator Author

Agree as well, but we should do a memory check comparison as well (e.g. with a PyPSA-EUR case).

good point, but no need to worry as we have the slicing logic which ensures that memory requirements are bound

@FabianHofmann FabianHofmann merged commit 2d69959 into master Sep 8, 2025
21 checks passed
@FabianHofmann FabianHofmann deleted the remove-pandas-lp-io branch September 8, 2025 12:14
@fneum
Copy link
Member

fneum commented Sep 8, 2025

Maybe someone can do it nevertheless? I am not sure if anyone ever tested the polars writing implementation on PyPSA-Eur type large problem (even if it has same slicing logic).

@coroa
Copy link
Member

coroa commented Sep 8, 2025

Anyone knows whether the polars code here is within the confines of the narwhals compat layer (which is a subset of the full polars api)? Then the switching between pandas and polars could easily be made a config option. And atlite does not have to depend on polars, but it could remain optional.

@coroa coroa mentioned this pull request Sep 9, 2025
4 tasks
@FBumann
Copy link
Contributor

FBumann commented Nov 3, 2025

@FabianHofmann @coroa Great Work.
I am seeing a decrease in time spent in model.to_file() of about 73 % (393 s -> 107 s).
With model.shape = (24_558_097, 14_991_445)

@lkstrp
Copy link
Member

lkstrp commented Nov 3, 2025

The PR was breaking though, and the release should have been 0.6.0 anyway. The two check_has_nulls_polars'in to_polars are raised if a constraint contains nans. This is the case for multi investment periods in PyPSA and KVL. I'm not even sure what the check is doing there.

Also filter_nulls

short = filter_nulls_polars(short)

only checks for "empty" constraints (label is -1 or coeff is 0) but not for actual nans. While the line below not just treats "empty" constraints and raise actually nans:
check_has_nulls_polars(short, name=f"{self.type} {self.name}")

@FabianHofmann Could you point me at a fix?
Should this have been dealt with already? Should filter_nulls_polars remove them? What is the purpose of check_has_nulls_polars anyway? I don't encounter any errors if I simply remove it. We can for sure make the naming clearer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants